BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data
نویسندگان
چکیده
MOTIVATION Transcription factors (TFs) are a class of DNA-binding proteins that have a central role in regulating gene expression. To reveal mechanisms of transcriptional regulation, a number of computational tools have been proposed for predicting TF-DNA interaction sites. Recent studies have shown that genome-wide sequencing data on open chromatin sites from a DNase I hypersensitivity experiments (DNase-seq) has a great potential to map putative binding sites of all transcription factors in a single experiment. Thus, computational methods for analysing DNase-seq to accurately map TF-DNA interaction sites are highly needed. RESULTS Here, we introduce a novel discriminative algorithm, BinDNase, for predicting TF-DNA interaction sites using DNase-seq data. BinDNase implements an efficient method for selecting and extracting informative features from DNase I signal for each TF, either at single nucleotide resolution or for larger regions. The method is applied to 57 transcription factors in cell line K562 and 31 transcription factors in cell line HepG2 using data from the ENCODE project. First, we show that BinDNase compares favourably to other supervised and unsupervised methods developed for TF-DNA interaction prediction using DNase-seq data. We demonstrate the importance to model each TF with a separate prediction model, reflecting TF-specific DNA accessibility around the TF-DNA interaction site. We also show that a highly standardised DNase-seq data (pre)processing is a requisite for accurate TF binding predictions and that sequencing depth has on average only a moderate effect on prediction accuracy. Finally, BinDNase's binding predictions generalise to other cell types, thus making BinDNase a versatile tool for accurate TF binding prediction. AVAILABILITY AND IMPLEMENTATION R implementation of the algorithm is available in: http://research.ics.aalto.fi/csb/software/bindnase/. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplemental data are available at Bioinformatics online.
منابع مشابه
Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites
Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5-20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, b...
متن کاملAre all genetic variants in DNase I sensitivity regions functional?
A detailed mechanistic understanding of the direct functional consequences of DNA variation on gene regulatory mechanism is critical for a complete understanding of complex trait genetics and evolution. Here, we present a novel approach that integrates sequence information and DNase I footprinting data to predict the impact of a sequence change on transcription factor binding. Applying this app...
متن کاملDifferential DNase I hypersensitivity reveals factor-dependent chromatin dynamics.
Transcription factor cistromes are highly cell-type specific. Chromatin accessibility, histone modifications, and nucleosome occupancy have all been found to play a role in defining these binding locations. Here, we show that hormone-induced DNase I hypersensitivity changes (ΔDHS) are highly predictive of androgen receptor (AR) and estrogen receptor 1 (ESR1) binding in prostate cancer and breas...
متن کاملmsCentipede: Modeling heterogeneity across genomic sites improves accuracy in the inference of transcription factor binding
Motivation: Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 31 17 شماره
صفحات -
تاریخ انتشار 2015